1 00:00:04,230 --> 00:00:10,990 [Music] 2 00:00:14,990 --> 00:00:13,789 hello everyone I'm ready I'm a graduate 3 00:00:16,730 --> 00:00:15,000 student at the Earth Life Science 4 00:00:18,590 --> 00:00:16,740 Institute 5 00:00:20,990 --> 00:00:18,600 um in our study we have tried to 6 00:00:23,150 --> 00:00:21,000 understand scaling of protein function 7 00:00:26,210 --> 00:00:23,160 across the tree of life and the 8 00:00:28,189 --> 00:00:26,220 mechanisms that might have led to the 9 00:00:30,290 --> 00:00:28,199 species diversity on the tree of life 10 00:00:32,690 --> 00:00:30,300 right here 11 00:00:34,850 --> 00:00:32,700 so we study scaling using power loss and 12 00:00:36,590 --> 00:00:34,860 power laws are found everywhere if we 13 00:00:38,389 --> 00:00:36,600 consider the number of Web Hits on a web 14 00:00:40,490 --> 00:00:38,399 page in a given period of time or the 15 00:00:42,650 --> 00:00:40,500 earthquake magnitude in an area over a 16 00:00:45,360 --> 00:00:42,660 given period of time many natural and 17 00:00:47,389 --> 00:00:45,370 man-made processes follow power loss 18 00:00:49,190 --> 00:00:47,399 [Music] 19 00:00:50,930 --> 00:00:49,200 so how can we help 20 00:00:53,270 --> 00:00:50,940 um help us how can this help us 21 00:00:55,189 --> 00:00:53,280 understand some Concepts in biology for 22 00:00:57,170 --> 00:00:55,199 that let's consider a Lego set and 23 00:00:59,750 --> 00:00:57,180 consider the unique pieces in relation 24 00:01:02,869 --> 00:00:59,760 to the total pieces in the Lego set when 25 00:01:06,350 --> 00:01:02,879 we plot this on a log log scale 26 00:01:07,969 --> 00:01:06,360 we see the larger Lego sets use more 27 00:01:09,649 --> 00:01:07,979 Unique Piece types but they 28 00:01:11,090 --> 00:01:09,659 progressively go on using lesser 29 00:01:13,490 --> 00:01:11,100 additional piece types so they're 30 00:01:16,010 --> 00:01:13,500 becoming more efficient which means the 31 00:01:17,929 --> 00:01:16,020 larger sets are using uh the same pieces 32 00:01:20,390 --> 00:01:17,939 the smaller sets are using but in more 33 00:01:22,490 --> 00:01:20,400 efficient and more complex ways so what 34 00:01:24,530 --> 00:01:22,500 we're observing in these plots is a 35 00:01:26,630 --> 00:01:24,540 scaling relationship and when we observe 36 00:01:28,670 --> 00:01:26,640 a scaling relationship we could say that 37 00:01:30,050 --> 00:01:28,680 maybe there's a set of rules that sort 38 00:01:32,570 --> 00:01:30,060 of governing the way something is 39 00:01:36,350 --> 00:01:34,609 so you as a power law equation one 40 00:01:38,630 --> 00:01:36,360 quantity varying as a power law of the 41 00:01:40,550 --> 00:01:38,640 other and when we plot this on a log log 42 00:01:42,890 --> 00:01:40,560 scale we get a straight line with the 43 00:01:44,690 --> 00:01:42,900 slope Alpha so previous Studies have 44 00:01:46,789 --> 00:01:44,700 shown that genes in a specific 45 00:01:48,770 --> 00:01:46,799 functional category scale as a power law 46 00:01:51,050 --> 00:01:48,780 of the total number of genes in a genome 47 00:01:53,510 --> 00:01:51,060 so for example transcription regulation 48 00:01:56,510 --> 00:01:53,520 is almost quadratically scaling which 49 00:01:58,190 --> 00:01:56,520 means if the genome doubles in size the 50 00:01:59,810 --> 00:01:58,200 genes in this specific category are 51 00:02:02,030 --> 00:01:59,820 going to quadruple 52 00:02:04,069 --> 00:02:02,040 so we have tried to include an expanded 53 00:02:06,830 --> 00:02:04,079 taxonomy in our study and for that we 54 00:02:08,690 --> 00:02:06,840 use the eggnog database so after power 55 00:02:10,309 --> 00:02:08,700 of fitting we saw different Trends in 56 00:02:12,110 --> 00:02:10,319 our data for the smaller and the larger 57 00:02:13,970 --> 00:02:12,120 genomes so we carried out piecewise 58 00:02:15,949 --> 00:02:13,980 regression to give Justice to the 59 00:02:17,630 --> 00:02:15,959 different patterns and scaling observed 60 00:02:19,790 --> 00:02:17,640 in the plots 61 00:02:21,530 --> 00:02:19,800 so for example we wanted to capture the 62 00:02:23,930 --> 00:02:21,540 slope variability for the smaller and 63 00:02:25,910 --> 00:02:23,940 the larger genome sizes so for example 64 00:02:27,890 --> 00:02:25,920 in category M which is cell wall and 65 00:02:29,809 --> 00:02:27,900 cell membrane proteins 66 00:02:31,850 --> 00:02:29,819 um so the x-axis has the total protein 67 00:02:33,530 --> 00:02:31,860 annotations and the y-axis has the 68 00:02:36,410 --> 00:02:33,540 category annotations for that specific 69 00:02:38,089 --> 00:02:36,420 category so for the smaller genomes we 70 00:02:39,830 --> 00:02:38,099 can see the proteins are scaling fast 71 00:02:42,410 --> 00:02:39,840 which means as the genome size is 72 00:02:44,270 --> 00:02:42,420 increasing they are incorporating 73 00:02:46,250 --> 00:02:44,280 um more and more proteins faster than 74 00:02:48,110 --> 00:02:46,260 the larger genomes because after a 75 00:02:50,449 --> 00:02:48,120 statistically detected breakpoint the 76 00:02:52,670 --> 00:02:50,459 scaling slows down and we can see a 77 00:02:55,850 --> 00:02:52,680 similar Trend in category tree category 78 00:02:58,670 --> 00:02:55,860 T so we saw this trend in most of the 79 00:03:01,250 --> 00:02:58,680 breakpoints that that were supported in 80 00:03:03,650 --> 00:03:01,260 bacteria interestingly we observed an 81 00:03:05,750 --> 00:03:03,660 opposite Trend in archaea the scaling is 82 00:03:08,750 --> 00:03:05,760 slow in the start but fastens up after 83 00:03:10,369 --> 00:03:08,760 the statistically detected breakpoint so 84 00:03:12,350 --> 00:03:10,379 a lot of categories were common between 85 00:03:14,330 --> 00:03:12,360 archaea and bacteria 86 00:03:15,770 --> 00:03:14,340 um but there were some categories that 87 00:03:17,930 --> 00:03:15,780 were exclusively present either in 88 00:03:19,550 --> 00:03:17,940 archaeon bacteria 89 00:03:20,930 --> 00:03:19,560 um and it's also interesting to observe 90 00:03:22,309 --> 00:03:20,940 these differences in scaling pattern 91 00:03:25,009 --> 00:03:22,319 before and after the break point in 92 00:03:26,990 --> 00:03:25,019 Archaea and bacteria so we thought maybe 93 00:03:29,210 --> 00:03:27,000 these differences in scaling patterns 94 00:03:30,649 --> 00:03:29,220 were caused by phyla specific scaling so 95 00:03:33,290 --> 00:03:30,659 we broke these domains down into 96 00:03:35,210 --> 00:03:33,300 specific phyla and found great variation 97 00:03:37,910 --> 00:03:35,220 in all the phyla for all the categories 98 00:03:40,729 --> 00:03:37,920 so for example we have category H here 99 00:03:42,229 --> 00:03:40,739 which is coenzyme transport proteins 100 00:03:44,449 --> 00:03:42,239 um which also happens to be the most 101 00:03:46,430 --> 00:03:44,459 variable across all the phyla 102 00:03:48,350 --> 00:03:46,440 so we thought maybe this file a specific 103 00:03:49,670 --> 00:03:48,360 scaling is causing the positioning of 104 00:03:51,289 --> 00:03:49,680 the breakpoints that we observed 105 00:03:52,070 --> 00:03:51,299 previously 106 00:03:54,350 --> 00:03:52,080 um 107 00:03:56,149 --> 00:03:54,360 so I place these breakpoints on the 108 00:03:58,430 --> 00:03:56,159 total protein annotations to see if 109 00:04:00,410 --> 00:03:58,440 there's any specific pattern but we can 110 00:04:01,970 --> 00:04:00,420 see these individual phyla are spanning 111 00:04:04,850 --> 00:04:01,980 the breakpoints and there is no specific 112 00:04:06,589 --> 00:04:04,860 preference of for the break the file as 113 00:04:09,470 --> 00:04:06,599 to be present either on either sides of 114 00:04:10,850 --> 00:04:09,480 the breakpoints so maybe taxonomy is not 115 00:04:12,649 --> 00:04:10,860 causing the positioning of these 116 00:04:14,449 --> 00:04:12,659 breakpoints and maybe there are some 117 00:04:15,949 --> 00:04:14,459 other factors like physiological or 118 00:04:18,770 --> 00:04:15,959 environmental factors that are causing 119 00:04:20,569 --> 00:04:18,780 these fake points 120 00:04:23,270 --> 00:04:20,579 so we were also interested in these 121 00:04:25,490 --> 00:04:23,280 groups CPR and d-pan uh so these groups 122 00:04:28,969 --> 00:04:25,500 have extremely small genomes and they 123 00:04:31,310 --> 00:04:28,979 lack um major metabolic pathways uh so 124 00:04:33,770 --> 00:04:31,320 we compared them with um 125 00:04:36,170 --> 00:04:33,780 eukaryotes unicellular eukaryotes and 126 00:04:37,969 --> 00:04:36,180 Asgard alkia so for some categories we 127 00:04:40,129 --> 00:04:37,979 can see the scaling is very similar for 128 00:04:41,570 --> 00:04:40,139 category o but for some categories the 129 00:04:44,090 --> 00:04:41,580 scaling is very different like in 130 00:04:45,710 --> 00:04:44,100 category C which goes on to show there 131 00:04:47,749 --> 00:04:45,720 are different ways in which an organism 132 00:04:49,370 --> 00:04:47,759 can adapt while growing in their genome 133 00:04:50,030 --> 00:04:49,380 sizes 134 00:04:52,189 --> 00:04:50,040 um 135 00:04:54,409 --> 00:04:52,199 I've just discussed a few key results in 136 00:04:56,390 --> 00:04:54,419 my uh talk so if you want to discuss I 137 00:04:58,790 --> 00:04:56,400 would be interested please come by and 138 00:05:03,640 --> 00:04:58,800 stop at panel two for the poster thank